Grouping
Parentheses in regular expressions serve two distinct functions:
1) limit the scope of subexpressions when using alternation;2) identification of subexpressions (also known as groups) as a single unit.
Limiting the scope of subexpressions using alternation is discussed elsewhere.
Identifying a subexpressions allows:
1) the application of quantifiers to more than a single character in the regular expression;
Example
s = "Family cars are built for 4 people, but the average family has 4.2 people in it";
regex.extract ("[0-9]+(\\.[0-9]*)?", @s, @temp.regexList)
» 2By making the decimal point and any numbers after the decimal point optional, 2 matches were found.
2) capturing text matching the subexpression(s) in addition to capturing text matching the whole regular expression.
Those verbs that use the MatchInfo Table will be able use the text returned in the groupStrings cell.
Captured subexpressions are returned in the order of the opening paretheses.
Example
s = "Subject: What's today's lesson\r";
regex.easySearch ("(Subject:) +([^\r]*)\r", s, @temp.matchInfo)
» trueIn the example, the matched subexpressions are in the list at temp.matchInfo.groupStrings: {"Subject:", "What's today's lesson"}
Back references
Regex keeps track of the matched subexpressions via an index. This allows you to refer to the matched sub-expressions within the regular expression using the back-reference metacharacter, "\digit", where digit is an integer from 1 to 9.The backslash, "\" must be escaped to enable Frontier to pass it to the regex engine.
Example - finding doubled words
s = "The big black black dog";
regex.easySearch ("\\<(\\w+)\\s+\\1", s, @temp.matchInfo)
» trueThe pattern "\\<(\\w+)" will identify a word (using three character class shortcuts), with the letters of the word captured in the subexpression "(\\w+)". This subexpression is then referred to by "\\1" - after allowing for any whitespace.
In the case of regex.subst, backreferences can be used in the replacement text.
Example - replace a doubled word with only once
s = "The big black black dog";
regex.subst ("\\<(\\w+)\\s+\\1", "\\1", @s)
» true